Shapelet selection based on a genetic algorithm for remaining useful life prediction with supervised learning

RUL (remaining useful life) shapelets were recently developed to overcome the shortcomings of similarity-based RUL prediction methods, such as high sensitivity to parameters. RUL shapelets are informative subsequences whose distances to a run-to-failure time series sample are very useful for predicting the RUL of the sample. However, the prediction performance and interpretability highly depend on the set of RUL shapelets, and it is very difficult to compose an optimized set. In this paper, we mathematically formalize the RUL shapelet composition problem with multiple objective functions. In addition, we analyze the characteristics of good RUL shapelet sets and develop a solution methodology based on a genetic algorithm. From the various experiments, we validate that the proposed method outperforms previous ones and suggest how to use the proposed method. The solution methodology developed in this paper can be applied to solve various RUL prediction problems. In addition, the findings on the RUL shapelets can help researchers develop their RUL shapelet-based solution.


Introduction
The remaining useful life (RUL) of an engineering system is defined as the length of time from the current time to the time of failure. It is essential for prognostics and health management (PHM) to predict the RUL accurately because the predicted RUL contributes to make important decisions such as maintenance schedules. In other words, PHM objectives such as avoiding accidents, anticipating failures and aiming reliable operation and maintenance can be obtained through accurate RUL prediction (Zio, 2022;Zonta et al., 2020).
RUL prediction approaches can be categorized into physics-based and data-driven approaches. The former relies on developing degradation process models to estimate the RUL by using domain knowledge, such as system failure mechanisms, while the latter relies on developing data-driven models by discovering degradation patterns from previously observed data of the system and estimating RUL based on these patterns by using statistical machine learning (ML) or deep learning (DL) models (Liao and Kottig, 2014). In this paper, we focus on the data-driven approach with ML or DL models, which has attracted much attention from both academia and industry thanks to the recent development of data collection and processing techniques.
using similarity-based models are interpretable based on the neighbor windows. In addition, they are appropriate for dealing with run-tofailure data collected from various operating conditions . Finally, it is easy to implement without any domain knowledge and analyzing degradation trends (Cai et al., 2020), because it employs similar historical data as references and relies on the historical data itself. In addition, it is difficult to obtain enough degradation data in real world applications (Ahn et al., 2021), but the similarity-based method has been proven effective to predict RUL with the limited data (Lyu et al., 2020).
Owing to these advantages, these methods have been frequently addressed in the literature such as Lyu et al. (2020), Liu et al. (2019), Bingjie et al. (2021) and Malinowski et al. (2015). For example, Lyu et al. (2020) proposed similarity-based RUL prediction method based on dynamic time warping (DTW). DTW is a distance measure to calculate distance between time series samples with different length. Since it requires huge computational time, they also introduce a coarse-tofine strategy to find the neighbor windows in an efficient way. RUL of test window is estimated based on the neighbor windows and adjusted by degradation rate and time gap based adjustment strategies. Liu et al. (2019) developed an RUL prediction method that consists of three steps: (1) health index construction, (2) similarity matching, and (3) RUL prediction. In the first step, every multivariate training sample is transformed into a univariate health index using principal component analysis (PCA), and a new sample is also transformed similarly. In the second step, every health index is again transformed to a set of sliding windows of the same length. In the third step, the nearest neighbor of every window in the new sample is found among all windows in a training health index based on the mean of the Euclidean similarity and cosine similarity. Finally, the RUL of each window in the new sample is estimated as the weighted mean of RULs of its neighbor windows, where the weight is the sum of the normalized Euclidean and cosine similarities. Bingjie et al. (2021) employed K-means clustering algorithm for the similarity-based RUL prediction, considering that run-to-failure samples are collected under different operating conditions. That is, they use the algorithm to group similar training samples and find the closest cluster to each test sample. The training samples in the closet cluster are used to estimate RUL of the test sample using kernel density estimation.
Even though similarity-based methods have many advantages, they also have several critical disadvantages. For example, the prediction accuracy and interpretability are very sensitive to the hyperparameters, such as the window size, number of neighbors, and similarity measures. In addition, they usually use fixed window for low computational complexity, leading to miss potentially good windows. Using sliding window instead of the fixed window, however, leads to long computational time.
In this regard, Malinowski et al. (2015) proposed a method based on RUL shapelets by employing the concept of shapelets. Shapelets are the subsequences used for time series classification and distance between the shapelets and a time series sample are employ to determine its class (Ye and Keogh, 2009). It has been adopted in various applications owing to its several advantages including high accuracy, interpretability, few hyperparameters and so forth. For example, Liu et al. (2015) applied shapelets to recognize complex human activities suffering from portability, interpretability and extensibility. As another example, AlDhanhani et al. (2019) used shapelets to represent traffic incidents and congestion patterns for detecting traffic events. It is very difficult to find the optimal shapelets efficiently, and some studies addressed this problem. For example, Grabocka et al. (2014) applied stochastic gradient learning to find the near-to-optimal shapelets without evaluating a lots of shapelet candidates.
RUL shapelets are subsequences whose distance can be used as a feature vector to predict RUL. Since the RUL shapelets are more informative than window and the number of them is much smaller than that of windows, it is more effective and efficient to use RUL shapelets instead of similarity-based methods. Even though they proposed interest-ing concepts, RUL shapelets, they did not analyze the properties of the RUL shapelets and not consider interactions among the RUL shapelets. In addition, they predict RUL as a mean of RUL of time series samples after RUL shapelets appear. Therefore, it is necessary to consider the properties and interaction to use RUL shapelets effectively, and our research objective is to develop a RUL shapelet selection method considering them.
In order for the RUL shapelets to be used, the distance between time series samples and shapelets should be proportional to the RULs to obtain high accuracy. In addition, the number of shapelets should be small enough for the interpretability and to allow a short estimation time. Finally, there should be no redundancy among the RUL shapelets, positive interactions should exist among them. Here, redundant RUL shapelets are shapelets whose distances to time series samples are highly correlated with each other, and positive interaction between two RUL shapelets means that RUL can be estimated accurately only when considering both distances. It is obvious that the estimation accuracy, interpretability, and estimation time highly depend upon the set of selected RUL shapelets, but previous research did not consider this. Genetic algorithm (GA) is one of the most widely used metaheuristic algorithms to solve various time series data analysis problems. It has been successfully applied to solve various optimization problems including shapelet selection (Xue et al., 2020), which is similar to our problem.
The major contents and contributions of our research are as follows. First, we mathematically formulate the RUL shapelets selection problem as a feature selection problem with three objectives: (1) to minimize the error of RUL prediction, (2) to minimize the number of RUL shapelets, and (3) to minimize redundancy among the shapelets. To achieve this goal, we expand the concept of RUL shapelets to the feature vectors, by which the machine learning model can be trained in order to consider the interaction between RUL shapelets that appear in the different locations of the time series. Second, we discuss the properties of good RUL shapelets, including their redundancy and interactions. The discussion can be summarized as (1) selecting RUL shapelets considering correlation between distance to RUL shapelets and RULs only may lead to focusing on the later part of the time series, (2) good RUL shapelets occur at similar locations in every sample, and (3) even good RUL shapelets in the same interval causes redundancy which negatively impact on RUL prediction performance. Finally, we develop a GA to solve the formalized RUL shapelet selection problem. Especially, we focus on designing initialization method of the GA based on the discussion on the properties of good RUL shapelets. In addition, we also design proper fitness functions for our problem.
The rest of this paper is organized as follows. Section 2 presents the preliminaries of the study, including shapelet discovery, RUL shapelet, and GA. Section 3 introduces and formulates the RUL shapelets selection problem, and Section 4 analyzes the properties of the RUL shapelets and develops a GA-based RUL shapelet selection algorithm. Section 5 verifies the proposed algorithm through experiments. Finally, Section 6 concludes the research.

Shapelet discovery
be a time series sample and ∈ {1, 2, ⋯ , } be its label. We say = ( is a subsequence of the time series dataset ∋ if there are one or more samples satisfying the following: where ∶ + denotes ( , +1 , ⋯ , + ) . The distance between arbitrary subsequence ′ and , , is defined as presented in equation (2). where ( , ) is the Euclidean distance between two vectors, and . We say A shapelet is defined as the subsequence whose distance to each class maximizes the class relevance in time series classification problems (Ye and Keogh, 2009). In other words, the shapelet is the subsequence that minimizes the loss function of a classifier when its distance to each class is used as a feature as follows: where  (⋅) is a loss function for a classifier , is a set of all possible subsequences, and is the label vector.
Since the search space is too big to find in equation (3), many heuristic approaches have been proposed to find shapelets under several assumptions. For example, Grabocka et al. (2014) developed a learning method to estimate shapelets of a given length. The method initializes candidates of the shapelet randomly and updates them using stochastic gradient descent optimization to minimize the loss function. Note that shapelet found by heuristic approaches would not satisfy equation (3). Even worse, may not be the subsequence satisfying equation (1).

RUL shapelet
RUL shapelets are defined as subsequences containing information about the RUL. One can estimate the RUL based on the distance to them from the run-to-failure time series 1∶ (Malinowski et al., 2015). More formally, the RUL shapelet is expressed as a tuple ( , , ), where is a subsequence, is a threshold for the distance between and 1∶ , and is the estimated RUL. That is, we estimate RUL of 1∶ as when matches 1∶ and ( , 1∶ ) ≤ . Each element of the RUL shapelet is obtained as follows. is one of the cluster centers obtained by applying the k-means clustering algorithm to every subsequence whose length is = 2, 3, ⋯ , , where is the maximum length of RUL shapelets. Because is a cluster center, one can say that it is close to other subsequences. The threshold is calculated as the minimum distance between and the ℎ sample ( ) as presented in equation (4): Let us define the RUL ( ) after matching to ( ) as shown in the equation (5): Let us also define ̃( ) be the th smallest among ( ) for all . Then, is the mean of {̃( ) | = 1, 2, ⋯ ,̃}, where ̃is calculated as presented in equation (6): where is the variance of ̃( 1) , ̃( 2) , ⋯, and ̃( ) .

Genetic algorithm
GA is one of the most widely used metaheuristic algorithms to solve various time series data analysis problems. It was proposed early 1970s, but it is still powerful method and has been employed to solve recent research problems such as feature selection (Ahn and Hur, 2020), hyperparameter tuning (Ahn and Hur, 2020), shapelet selection (Xue et al., 2020), and so forth.
In the first step, a set of solutions called the population is initialized. In the second step, the solutions in the current population are evaluated using a fitness function, and some solutions with the highest fitness score are selected. In the third step, children of the selected solutions are generated using crossover and mutation operators. Crossover operators generate a child of two randomly selected solutions (called parents) and mutation operators add a variation to the child to avoid the situation where most solutions in the population are similar to each other. In the fourth step, the generated children and selected solutions compose a new population. If the termination condition is satisfied, then the currently best solution is returned; otherwise, steps (2) to (4) are repeated. The solution representation method, crossover, and mutation operators in GA should be determined according to the specific purpose. For example, each solution can be represented as a binary vector for a feature selection problem (Ahn and Hur, 2020). As another example, each solution can be represented as a set of shapelets for a shapelet selection problem (Vandewiele et al., 2021). The most well-known crossover operators are one-point (see Fig. 2(a)), two-point (see Fig. 2(b)), and uniform crossover operators (see Fig. 2(c)). As seen in this figure, the crossover operators pick crossover points at random, and components of the parent solutions (i.e., genes) are swapped to generate children.
Examples of mutation operators are the flip bit operator, Gaussian operator, and so forth. The flip bit operator selects some genes at random and converts the genes into 1 if they are 0, and into 0 otherwise. Gaussian operators select some genes at random and add Gaussian noise to them.

Problem statement
Suppose we have run-to-failure data including samples with differ- is the value measured immediately after time from when the equipment described by the data started to be used. Additionally, is the lifetime of the equipment. Based on , we can label RUL for all and as presented in equation (7): where ( ) is the label for ( ) . We use the relative RUL instead of absolute RUL (i.e., − ) for effective machine learning modeling. Using the label, we convert the run-to-failure data into a training dataset with the following equation (8): . We use ( ) 1∶ instead of ( ) to extract cumulative information until to predict RUL at .
The problem considered in this paper is to select a set of RUL shapelets, = { 1 , 2 , ⋯ , } , from with three objectives: (1) to minimize the error of RUL prediction, (2) to minimize the number of RUL shapelets, and (3) to minimize redundancy among the shapelets. We define RUL shapelets as subsequences whose distances to ( ) 1∶ are used as a feature vector for ML-or DL-based RUL prediction models. In other words, we predict ( ) with a trained regression model as presented in equation (9):̂( ) .
The three objectives can be mathematically expressed as shown in equation (10), (11), and (12), respectively: minimize , where (⋅) is the Pearson correlation coefficient, which is adopted because it is frequently-used to measure the redundancy among the features of a supervised model (Nasir et al., 2020) and the RUL shapelets are also the features. In addition to these three objectives, the efficiency of exploring should also be considered. That is, we cannot solve the problem by comparing all possible candidates due to the large search space (i.e., the number of all possible candidate is 2 in the worst case scenario).

Proposed algorithm
This section proposes the algorithm to compose a set of RUL shapelets using GA. The overall flow chart to select RUL shapelets is illustrated in Fig. 3 and Algorithm 1.
Solution of the proposed algorithm is a set of RUL shapelets and the proposed algorithm selects the solution by generating and evaluating many solution candidates. The specific process is as follows. As the first step, it generates initial solutions as follows.
(4) For each interval, centroid with length ( = 2, 3, ⋯ , ), is calculated and the centroid which has the highest correlation with RUL is selected as a RUL shapelet. By doing this, the lengths of RUL shapelets in the same solution become different from each other.
In the second step, solutions are evaluated using two fitness functions (i.e., relevant function and a redundancy fitness function) and − solutions with the highest fitness are selected. We call them parents. In the third step, two parents are selected at random and a child is generated using a crossover operator. It repeats times and after that Generate children with parent using crossover 12 Compose new with the parent and children } Output : set of RUL shapelets with the highest fitness functions new generation with − parents and children is composed. Finally, the algorithm repeats the second and third steps, and returns the best solution ever found during the iterations. Fig. 4 visualizes the proposed algorithm for the reader's understanding.

Properties of good RUL shapelets
Before describing the proposed method in detail, we discuss some properties of good RUL shapelets. First, even though the distances to RUL shapelets and RULs should be correlated, selecting RUL shapelets based on this correlation may lead to focusing on the later part of the time series due to the distance property presented in equation (13): In other words, for any two time points and with > , it would be rare for the subsequence that occurs at to have a higher correlation with RUL than a subsequence , which usually occurs at . This is the case because is. Since using sets of RUL shapelets that usually occur only at the rear is not appropriate for the RUL prediction, we should consider both the location of RUL shapelets as well as the correlation with the RUL.
Second, good RUL shapelets should occur at similar locations in every time series sample. Fig. 5 illustrates an example of good and bad RUL shapelets with two time series samples (1) and (2) and three RUL shapelets 1 , 2 , and 3 . As seen in this figure, 1 occurs at the interval [100%, 80%] of both samples, but also appears at [80%, 60%] in (1) .
occurs in the first sample only and 3 occurs at the interval [40%, 20%] in both samples but does not occur at any other intervals. Therefore, we can say that and are bad RUL shapelets, but 3 is a good shapelet.
Third, two or more good RUL shapelets in the same interval causes redundancy, which may decrease the RUL prediction accuracy of a supervised model. On the contrary, good RUL shapelets from different intervals will interact with each other to increase accuracy.
We explain the RUL prediction process with good RUL shapelets 4 , 5 , and 6 at time ∈ { 1 , 2 , 3 } , as presented in Fig. 6. In this figure, a red-dashed line means an RUL shapelet is matched to the corresponding subsequence. For example, 1 is matched to the subsequence in the interval [ 0 , 1 ) and its distance is 2. The distances to RUL shapelets at each time are used as a feature vector of regression model . For example, the distances are 2, 10, and 15 at 1 . Accordingly, (2, 10, 15) is used as the feature vector. Note that the distance to each shapelet is not changed after the matching due to the property described in equation (13). For example, is not changed after 1 matches the subsequence in the interval . Note also that one cannot exactly know whether a shapelet is matched or not until the entire time series sample is observed.

Initialization
The properties of a good RUL shapelet can be summarized as follows. First, its distance to the run-to-failure time series samples is highly correlated with the RUL. Second, it occurs at similar intervals in most First, the number of RUL shapelets, , is sampled from the discrete uniform distribution DU (2, M), where M is a user parameter indicating the maximum number of RUL shapelets. Second, [0, 1] is split into intervals at random as shown in the equation (14): where V is a set of intervals and is a random variable that follows a continuous uniform distribution CU (−0.1, 0.1). Third, D is split into each interval as presented in the equation (15): where the data are split into intervals and D is the th interval, is the th interval in V (i.e., , and [0] and [1] are the lower bound and upper bound of , respectively. Fourth, the centroid , of a subsequence with length 2 ≤ ≤ for every D is calculated as shown in the equation (16): where is a user parameter indicating the maximum length of RUL shapelets. Finally, the centroid whose distance to the time series is the most correlated with the RUL is selected and used as for all as presented in equation (17) . Here, , is a vector 1∶ , 1∶ +1 , .  The whole process is summarized in Algorithm 2, which is repeated times, where is the population size which is the number of solutions in each generation.

Algorithm 2 Population Initialization
12 Add to } Output : an offspring (set of RUL shapelets)

Evaluation and crossover
Based on the properties of good RUL shapelets, we propose two fitness functions: a relevant function and a redundancy fitness function. The former evaluates an offspring = { 1 , 2 , ⋯ , } in terms of the correlation between RUL and the minimum cosine distance of ( = 1, 2, ⋯ , ) to ( ) 1∶ . To be more specific, let ( ) , be the minimum cosine distance between ( ) 1∶ and , which is obtained as shown in the equation (18): where is the cosine distance between and ( ) ∶ + . Here, we use the cosine distance instead of other distance measures such as Euclidean distance to calculate the shape similarity, because the degradation patterns of RUL shapelets are, in general, more related with shapes than values. The relevant fitness function value 1 ( ) of is calculated as the absolute mean of the Pearson correlation coefficient between = where is the Pearson correlation coefficient. The redundancy fitness function value 2 ( ) of evaluates an offspring in terms of the correlation between the feature value ( ) , . That is, is evaluated as shown in the equation (20): where is Every offspring is evaluated as 1 ( ) × 2 ( ). Some offspring in the population with the highest evaluation score are selected, and new offspring are generated using one-point crossover, as illustrated in Fig. 2  (a).

Experiment
In this section, we conduct three experiments to validate the effectiveness of our method and show how to use it properly.

Experimental design
In the first experiment, we compare the proposed method with other similarity-based methods by applying them to several benchmark datasets. The methods used in the experiment are  Table 1.
We compare every method with every combination of hyperparameters presented in Table 1 in terms of the prediction score (PS) proposed by Saxena and Goebel (2008) for each run-to-failure dataset. The prediction score is defined as presented in the equation (21).
where is an actual RUL and ̂is a predicted RUL. The specific procedure using a dataset, which consists of a training dataset and a test dataset, is described as follows. First, we convert every sample in the dataset into the structure presented in equation (8). Second, we train a model with a specific hyperparameter combination ℎ and calculate the prediction score for test sample ′ ( ′ = 1, 2, ⋯ , ′ ) , where ′ is the number of test samples as determined in equation (22): Here, ̂( ℎ, is the value of ( ′ ) predicted using the model with ℎ. We do not use ( ) 1∶ when < ⌊ × 0.2⌋ or > ⌊ × 0.9⌋ because it may be useless to estimate RUL if is too small or too large. The PS for each method is calculated as shown in the equation (23): where is the set of all possible hyperparameter combinations and | | is its size. Note that we calculate 10 times for methods with randomness and use their average for the objective comparison.
In the second experiment, we validate the initialization algorithm of the method by comparing it with random initialization algorithm employed in most GAs. The random initialization algorithm generates ≤ initial RUL shapelets by selecting a sample from the training dataset and randomly choosing a subsequence whose length is smaller than .
In the third experiment, we conduct sensitivity analysis on the parameters to show the relationship between hyperparameters and the RUL prediction accuracy. We also suggest how to determine the parameters. We conduct repeated measures analysis of variance (RMANOVA) with data when the independent variables are the hyperparameters , , , and Model and the dependent variable is the mean MAE of the model under the hyperparameters for each dataset to find the most important hyperparameters. Then, we conduct the sensitivity analysis for the selected important hyperparameters, leaving other important parameters fixed.

Datasets
The datasets for the experiments are C-MAPSS (commercial modular aero-propulsion system simulation) datasets provided by Saxena and Goebel (2008). These are simulation datasets obtained under several operating conditions and fault modes, and each of them can be distinguished according to the number of conditions and modes. Chao et al. (2021) use C-MAPSS to generate more realistic simulation datasets, con- sidering real flight condition. We call the dataset provided by Saxena and Goebel (2008) C-MAPSS (ver.1) and the dataset provided by Arias . Each dataset has 14 health parameters, and we make a new health index that is used as a time series to predict RUL, as presented in Zhu et al. (2021). The other datasets are life time of Li-ion batteries measured under various room temperature provided by Goebel et al. (2008). The datasets are separated according to its operating conditions, but we combined all because each dataset has few data and similarity based methods can handle the combined data. We call the dataset Battery. Specific information on the datasets is summarized in the following Table 2.
As seen in this Table 2, each dataset consists of training and test dataset and C-MAPSS data consists of several datasets according to operating conditions such as fault mode. We will call each dataset in a row of the table dataset if there is no confusion in meaning and thus we have eight datasets.

Fig. 7 (a)-(h) compares the mean PSs of each method in each dataset.
As seen in this figure, for the most datasets, our method outperforms the other methods, especially the method of Malinowski et al. (2015) that first introduced RUL shapelets. As we mentioned earlier, Malinowski et al. (2015) ignored the rule that good RUL shapelets should be frequent only in the specific range when developing a method. Our proposed method took this characteristic of good shapelets, as presented in subsection 4.1, into consideration. As a result, our performance was improved. To be more concrete, our method outperforms the method proposed by Malinowski et al. (2015) for all the eight benchmark datasets. In addition, our method also outperforms previous similaritybased methods for most datasets. Average PS of our method is bigger than Lyu's method only for two datasets, as seen in (a) c-MAPSS (ver.1) #1 and (f) c-MAPSS (ver.2) #2. Average PS of our method is bigger than Lyu's only for two datasets (a) c-MAPSS (ver.1) #1 and (f) c-MAPSS (ver.2) #2. Our method gives the second-best performance, which seems to come from the lack of data integrity or the noise in the dataset during the model training. Fig. 8 compares the mean PSs between the proposed initialization method and random initialization method for several regression models. In this figure, left white bar and right red bar are average PS using the random and proposed initialization methods, respectively.
From the figure, we can conclude that the proposed initialization method is better than the random initialization especially when the run-to-failure data is large. Specifically, the average PS's of our initialization method are smaller than those of random initialization for most datasets and regression models, except in 5 results among 56 results (eight datasets and seven regression models). Table 3 shows the RMANOVA results where each cell denotes Fvalue (p-value). For example, F-value and p-value of parameter for data C-MAPSS (ver.1) #1 are 1.08 and 0.3599, respectively.
As seen in this table, the p-values for and Model are smaller than 0.01 for every dataset. In particular, Model has the largest F-value except for C-MAPSS (ver.1) #4 and C-MAPSS (ver.2) #1. Thus, we con- clude that the regression model is the most important parameter and the length of RUL shapelets is the second most important parameter. Fig. 9 shows the average PS obtained by sensitivity analysis for various = 5, 10, 15, 20 when the model is SVR.
As seen in this figure, the larger the K is, the smaller the PS's are for all the datasets except for C-MAPSS (ver.2) #3 and Battery. Even for those two datasets, however, the PS's are the smallest when K = 20. Fig. 10 shows the sensitivity analysis results according to the regression model when is fixed as 20.
As seen in this figure, we can find that linear models, including Lasso, LR, SVR, and MLP, show better results than tree models like DT and RF.
The experimental results can be summarized as follows. First, our method outperforms previous similarity-based methods including the method of Malinowski et al. (2015) that first introduced RUL shapelets for most datasets. Second, our initialization method shows better results than those of random initialization method for most regression models and datasets. Third, the regression model and the length of RUL shapelets are the most and second most important parameters, respectively. Fourth, the larger the maximum length of RUL shapelets is, the smaller the RUL prediction errors are. Finally, the linear models such as Lasso, LR, SVR and MLP are more proper than tree models such as DT and RF.

Conclusion
In this paper, we formulized the RUL shapelet selection by using a mathematical optimization problem with three objectives: 1) to minimize the error of RUL prediction, 2) to minimize the number of RUL shapelets, and 3) to minimize redundancy among the shapelets. In addition, we characterized some of the properties that a good set of RUL shapelets should possess. First, the RUL of a time series sample is proportional to the minimum distance to each shapelet. Second, good RUL shapelets should occur at a similar location in every time series sample, while also not occurring at a different location. Finally, two or more shapelets should not occur in the same interval.  Based on these properties, we developed a GA-based RUL shapelet selection algorithm. This method selects frequent subsequences locally, not globally, and does not select two or more subsequences from the same interval. From our experiment, we validated that the proposed method outperforms previous methods. We also provided some guidelines for determining the hyperparameters and selecting the machine learning model. We also provided an initialization method that works well when the data is complicated or when the regression model is linear. And if we have a large number of RUL shapelets, we can get a smaller prediction error.
The limitations of the proposed methods are as follows. First, the method can only be used when a one-dimensional health index exists. In other words, the method works for univariate time series only. Second, the method is very expensive in terms of computational complexity. It requires iterative computation such as splitting the dataset based on intervals, finding centroids, crossover mutation for two sets of RUL shapelets and so forth. Finally, it is difficult to interpret the RUL prediction result when there are many RUL shapelets. Especially, this paper focuses only on the RUL prediction performance, and does not propose interpretation method using the selected RUL shapelets. As for future research, we suggest the method should be expanded to the multivariate time series directly, without introducing the health index. We also suggest the approximation method to calculate the distance between time series and RUL shapelets, and the method to reduce the number of candidates for the fast search. Finally, the interpretation method is necessary to use the proposed method in practice.

Author contribution statement
Gilseung Ahn: Conceived and designed the experiments; Analyzed and interpreted the data; Wrote the paper. Min-Ki Jin; Seok-Beom Hwang: Performed the experiments; Contributed reagents, materials, analysis tools or data. Sun Hur: Analyzed and interpreted the data; Wrote the paper.

Funding statement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data availability statement
Data will be made available on request.

Declaration of interests statement
The authors declare no conflict of interest.

Additional information
No additional information is available for this paper.